We propose a deep learning method for three-dimensional reconstruction in low-dose helical cone-beam computed tomography. We reconstruct the volume directly, i.e., not from 2D slices, guaranteeing consistency along all axes. In a crucial step beyond prior work, we train our model in a self-supervised manner in the projection domain using noisy 2D projection data, without relying on 3D reference data or the output of a reference reconstruction method. This means the fidelity of our results is not limited by the quality and availability of such data. We evaluate our method on real helical cone-beam projections and simulated phantoms. Our reconstructions are sharper and less noisy than those of previous methods, and several decibels better in quantitative PSNR measurements. When applied to full-dose data, our method produces high-quality results orders of magnitude faster than iterative techniques.
translated by 谷歌翻译
延时图像序列提供了对动态过程的视觉吸引人的见解,这些过程太慢,无法实时观察。但是,由于天气(例如天气)以及循环效应(例如昼夜周期),播放长时间的序列通常会导致分散注意力的闪烁。我们以一种允许单独的,事后控制整体趋势,环状效应和图像中随机效应的方式介绍了解散延时序列的问题,并描述了基于数据驱动的生成模型的技术这个目标。这使我们能够以仅输入图像不可能的方式“重新渲染”序列。例如,在可选的,一致的天气下,我们可以稳定长序列,以重点关注植物的生长。我们的方法基于生成对抗网络(GAN),这些网络(GAN)以延时序列的时间坐标为条件。我们设计了我们的体系结构和培训程序,以便网络学会为随机变化(例如天气,使用GAN的潜在空间)建模,并通过使用特定频率的傅立叶功能将调理时间标签馈送到模型中,从而消除整体趋势和周期性变化。 。我们表明,我们的模型对于训练数据中的缺陷是可靠的,使我们能够修改捕获长时间序列的一些实际困难,例如临时遮挡,不均匀的框架间距和缺失框架。
translated by 谷歌翻译
我们提出了一个视频生成模型,该模型可以准确地重现对象运动,摄像头视图的变化以及随着时间的推移而产生的新内容。现有的视频生成方法通常无法生成新内容作为时间的函数,同时保持在真实环境中预期的一致性,例如合理的动态和对象持久性。一个常见的故障情况是,由于过度依赖归纳偏见而提供时间一致性,因此内容永远不会改变,例如单个潜在代码决定整个视频的内容。在另一个极端情况下,没有长期一致性,生成的视频可能会在不同场景之间不切实际。为了解决这些限制,我们通过重新设计暂时的潜在表示并通过较长的视频培训从数据中学习长期一致性来优先考虑时间轴。为此,我们利用了两阶段的培训策略,在该策略中,我们以低分辨率和高分辨率的较短视频分别训练了较长的视频。为了评估模型的功能,我们介绍了两个新的基准数据集,并明确关注长期时间动态。
translated by 谷歌翻译
FR \'Echet Inception距离(FID)是在数据驱动的生成建模中对模型进行排名的主要度量。虽然非常成功,但众所周知,该指标有时不同意人类的判断力。我们研究了这些差异的根本原因,并可视化生成图像中的FID“看”的内容。我们表明,FID(通常)计算的功能空间非常接近成像网分类,以使生成图像和真实图像集之间的顶部 - $ n $分类的直方图可大大降低FID - 而无需实际提高质量结果。因此,我们得出结论,FID容易出现故意或意外扭曲。作为偶然失真的实际例子,我们讨论了一个Imagenet预先训练的封装可以实现与stylegan2相当的情况的情况,同时在人类评估方面变得更糟
translated by 谷歌翻译
Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. The approach does not require changes to loss functions or network architectures, and is applicable both when training from scratch and when fine-tuning an existing GAN on another dataset. We demonstrate, on several datasets, that good results are now possible using only a few thousand training images, often matching StyleGAN2 results with an order of magnitude fewer images. We expect this to open up new application domains for GANs. We also find that the widely used CIFAR-10 is, in fact, a limited data benchmark, and improve the record FID from 5.59 to 2.42.
translated by 谷歌翻译
This paper describes a simple technique to analyze Generative Adversarial Networks (GANs) and create interpretable controls for image synthesis, such as change of viewpoint, aging, lighting, and time of day. We identify important latent directions based on Principal Component Analysis (PCA) applied either in latent space or feature space. Then, we show that a large number of interpretable controls can be defined by layer-wise perturbation along the principal directions. Moreover, we show that BigGAN can be controlled with layer-wise inputs in a StyleGAN-like manner. We show results on different GANs trained on various datasets, and demonstrate good qualitative matches to edit directions found through earlier supervised approaches.
translated by 谷歌翻译
The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images. In addition to improving image quality, this path length regularizer yields the additional benefit that the generator becomes significantly easier to invert. This makes it possible to reliably attribute a generated image to a particular network. We furthermore visualize how well the generator utilizes its output resolution, and identify a capacity problem, motivating us to train larger models for additional quality improvements. Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality.
translated by 谷歌翻译
The ability to automatically estimate the quality and coverage of the samples produced by a generative model is a vital requirement for driving algorithm research. We present an evaluation metric that can separately and reliably measure both of these aspects in image generation tasks by forming explicit, non-parametric representations of the manifolds of real and generated data. We demonstrate the effectiveness of our metric in StyleGAN and BigGAN by providing several illustrative examples where existing metrics yield uninformative or contradictory results. Furthermore, we analyze multiple design variants of StyleGAN to better understand the relationships between the model architecture, training methods, and the properties of the resulting sample distribution. In the process, we identify new variants that improve the state-of-the-art. We also perform the first principled analysis of truncation methods and identify an improved method. Finally, we extend our metric to estimate the perceptual quality of individual samples, and use this to study latent space interpolations.
translated by 谷歌翻译
We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CELEBA images at 1024 2 . We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher-quality version of the CELEBA dataset.
translated by 谷歌翻译